{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "JPUfjAlRUB8w"
      },
      "source": [
        "# Use banks to cache prompts with Anthropic API\n",
        "\n",
        "<a target=\"_blank\" href=\"https://colab.research.google.com/github/masci/banks/blob/main/cookbook/Prompt_Caching_with_Anthropic.ipynb\">\n",
        "  <img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/>\n",
        "</a>\n",
        "\n",
        "Prompt caching allows you to store and reuse context within your prompt saving time and money. When using the prompt cache feature from Anthropic, the chat messages have to be expressed in blocks rather than simple text, so that you can designate one of the for being cached.\n",
        "\n",
        "Let's see how Banks makes this super easy."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "4w6N2F8gGF7q"
      },
      "outputs": [],
      "source": [
        "!pip install banks"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QF9UZVjaUsK1"
      },
      "source": [
        "To simulate a huge prompt, we'll provide Claude with a full book in the context, \"Pride and prejudice\"."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "Ayno0BHEStAm"
      },
      "outputs": [],
      "source": [
        "!curl -O https://www.gutenberg.org/cache/epub/1342/pg1342.txt"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Read the whole book and assign to the `book` variable."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "N2EcJ1P6Svx6"
      },
      "outputs": [],
      "source": [
        "with open(\"pg1342.txt\") as f:\n",
        "    book = f.read()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "xzTdJJubVGkL"
      },
      "source": [
        "With Banks we can define which part of the prompt specifically will be cached. \n",
        "Directly from the prompt template text, we can use the `cache_control` built-in filter to tell Anthropic that\n",
        "we want to cache the prompt up to and including the `{{ book }}` template block."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "7PO4397MSm-f"
      },
      "outputs": [],
      "source": [
        "import time\n",
        "\n",
        "import litellm\n",
        "from litellm import completion\n",
        "\n",
        "from banks import Prompt\n",
        "\n",
        "\n",
        "tpl = \"\"\"\n",
        "{% chat role=\"user\" %}\n",
        "Analyze this book:\n",
        "\n",
        "{# Only this part of the message content (including the book content) will be cached #}\n",
        "{{ book | cache_control(\"ephemeral\") }}\n",
        "\n",
        "{# This part won't be cached instead #}\n",
        "\n",
        "What is the title of this book? Only output the title.\n",
        "{% endchat %}\n",
        "\"\"\"\n",
        "\n",
        "p = Prompt(tpl)\n",
        "# render the prompt in form of a list of Banks' ChatMessage\n",
        "chat_messages = p.chat_messages({\"book\": book})\n",
        "# dump the ChatMessage objects into dictionaries to pass to LiteLLM\n",
        "messages_dict = [m.model_dump(exclude_none=True) for m in chat_messages]"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Let's call the Anthropic API for the first time. We don't expect any difference from a normal call without caching."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "3hsJHr29ThLj"
      },
      "outputs": [],
      "source": [
        "# First call has no cache\n",
        "start_time = time.time()\n",
        "response = completion(model=\"anthropic/claude-3-5-sonnet-20240620\", messages=messages_dict)\n",
        "\n",
        "print(f\"Non-cached API call time: {time.time() - start_time:.2f} seconds\")\n",
        "print(response.usage)\n",
        "print(response)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "Now the book content is in the cache, and the difference in time and cost repeating the previous call is obvious."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "8F75jH4BTZ6U"
      },
      "outputs": [],
      "source": [
        "# Second call, the book is cached\n",
        "start_time = time.time()\n",
        "response = completion(model=\"anthropic/claude-3-5-sonnet-20240620\", messages=messages_dict)\n",
        "\n",
        "print(f\"Cached API call time: {time.time() - start_time:.2f} seconds\")\n",
        "print(response.usage)\n",
        "print(response)"
      ]
    }
  ],
  "metadata": {
    "colab": {
      "provenance": []
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}
